Back

Journal of Biomedical Informatics

37 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Augmenting Electronic Health Records for Adverse Event Detection
2026-02-11 health informatics 10.64898/2026.02.10.26345962
#1 (18.1%)
Show abstract

ObjectiveAdverse events (AEs) resulting from medical interventions are significant contributors to patient morbidity, mortality, and healthcare costs. Prediction of these events using electronic health records (EHRs) can facilitate timely clinical interventions. However, effective prediction remains challenging due to severe class imbalance, missing labels, and the complexity of EHR records. Classical machine learning approaches frequently underperform due to insufficient representation of minor...

2
Sino-US-DrugQA: A Benchmark for Evaluating Large Language Models in Cross-Jurisdictional Pharmaceutical Regulation
2026-02-17 health informatics 10.64898/2026.02.13.26346236
Top 0.1% (17.2%)
Show abstract

Cross-jurisdictional pharmaceutical compliance requires comparative analysis of regulatory requirements across jurisdictions such as the US FDA and Chinas NMPA. Although large language models (LLMs) are increasingly explored for healthcare-related applications, their performance in cross-jurisdictional regulatory comparison has not been systematically characterized using dedicated benchmarks. This study introduces Sino-US-DrugQA, a bilingual benchmark dataset designed to evaluate LLM performance...

3
PhenoSS: Phenotype semantic similarity-based approach for rare disease prediction and patient clustering
2026-03-02 health informatics 10.64898/2026.02.26.26347219
Top 0.2% (12.0%)
Show abstract

ObjectiveSystematic clinical phenotyping using Human Phenotype Ontology (HPO) is central to rare disease diagnosis. However, current disease prioritization (ranking candidate diseases from HPO for a patient) methods face key challenges: they often fail to account for the hierarchical structure of HPO terms, ignore dependencies among correlated terms, and do not adjust for batch effects arising from systematic differences in phenotype documentation across cohorts, institutions, or clinicians. We ...

4
Identifying Reasons for ACEI/ARB Non-Use in CKD Using Scalable Clinical NLP with Schema-Guided LLM Augmentation
2026-02-12 health informatics 10.64898/2026.02.10.26346025
Top 0.3% (12.0%)
Show abstract

IMPORTANCEAlthough angiotensin-converting enzyme inhibitors (ACEIs) and angiotensin receptor blockers (ARBs) are recommended for people with chronic kidney disease (CKD), they remain underused. Barriers to adherence, such as adverse effects or patient refusal, are frequently embedded within unstructured clinical narratives and are therefore inaccessible to structured data analytics. Scalable natural language processing (NLP) approaches are needed to identify these barriers and support guideline-...

5
Interpretable Fine-tuned Large Language Models Facilitate Making Genetic Test Decisions for Rare Diseases
2026-03-02 health informatics 10.64898/2026.02.26.26347223
Top 0.3% (11.8%)
Show abstract

Clinical decision making often relies on expert judgment guided by established guidelines, which can be challenging to standardize and abstract to implement. For example, selecting between gene panels and whole exome/genome sequencing (WES/WGS) for rare disease diagnosis frequently requires interpretation of evidence-based recommendations from the American College of Medical Genetics and Genomics (ACMG) guideline. Traditional machine learning (ML) models predicting suitable genetic tests often f...

6
Understanding Clinician Edits to Ambient AI Draft Notes: A Feasibility Analysis Using Large Language Models
2026-03-02 health informatics 10.64898/2026.02.27.26347290
Top 0.3% (11.8%)
Show abstract

Ambient AI documentation tools generate draft notes that clinicians can review and edit before signing off in electronic health records. Scalable computational approaches to characterize how clinicians modify drafts remain limited, yet are essential for evaluating and improving AI effectiveness. We examined the feasibility of a few-shot prompted large language model (LLM) for categorizing sentence-level edits between AI drafts and final documentation. We developed five label-specific binary mode...

7
Combining phenotypic similarity and network propagation to improve performance and clinical consistency of rare disease diagnosis
2026-02-17 health informatics 10.64898/2026.02.15.26346357
Top 0.3% (11.2%)
Show abstract

Achieving timely diagnosis for rare diseases remains challenging due to, among others, phenotypic heterogeneity and incomplete clinical data. While the Solve-RD project developed a phenotype-based gene prioritisation method, this approach did not account for the clinical consistency among related diseases in Orphanets hierarchical classifications. We present a phenotype-based computational pipeline that ranks candidate ORPHAcodes based on patient phenotypes. The pipeline computes patient-diseas...

8
Leveraging Generative Artificial Intelligence for Enhanced Data Augmentation in Emotion Intensity Classification: A Comprehensive Framework for Cross-Dataset Transfer Learning
2026-03-03 health informatics 10.64898/2026.02.23.26346928
Top 0.4% (10.9%)
Show abstract

Data scarcity and stylistic heterogeneity pose major challenges for emotion intensity classification. This paper presents a cross-dataset augmentation framework that leverages prompt-conditioned generative models alongside deterministic and heuristic transformations to synthesize target-style examples for improved transfer learning. We introduce a unified taxonomy of augmentation strategies--Heuristic Lexical Perturbation (HLA), Prompt-Conditioned Generative Augmentation (CGA), Sequential Hybrid...

9
Federated penalized piecewise exponential model for horizontally distributed survival data: FedPPEM
2026-02-12 health informatics 10.64898/2026.02.11.26346054
Top 0.4% (10.8%)
Show abstract

Cox proportional hazard regressions are frequently employed to develop prognostic models for time-to-event data, considering both patient-specific and disease-specific characteristics. In high-dimensional clinical modeling, these biological features can exhibit high collinearity due to inter-feature relationships, potentially causing instability and numerical issues during estimation without regularization. For rare diseases such as acute myeloid leukemia (AML), the sparsity and scarcity of data...

10
Medical concept understanding in large language models is fragmented
2026-03-05 health informatics 10.64898/2026.03.03.26347552
Top 0.5% (9.0%)
Show abstract

Large language models (LLMs) perform strongly across a wide range of medical applications, yet it remains unclear whether such success reflects genuine understanding of medical concepts. We present an ontology-grounded, concept-centered evaluation of medical concept understanding in LLMs. Using 6,252 phenotype concepts from Human Phenotype Ontology, we decompose concept understanding into three core dimensions--concept identity, concept hierarchy, and concept meaning--and design corresponding be...

11
Enhancing Prediabetes Diagnosis from Continuous Glucose Monitoring Data via Iterative Label Cleaning and Deep Learning
2026-03-05 health informatics 10.64898/2026.03.04.26347604
Top 0.7% (8.0%)
Show abstract

As of early 2026, over 115 million US adults (more than 1 in 3) have prediabetes, a condition with an annual conversion rate of 5%-10% to type 2 diabetes. Total diabetes (diagnosed and undiagnosed) affects approximately 40.1 million Americans, or 12% of the population, with roughly 1.5 million new cases diagnosed annually. Continuous Glucose Monitoring (CGM) provides real-time, 24/7 insights into glycemic variability, detecting dangerous highs, lows, and trends that HbA1c (a 3-month average) mis...

12
Representation Before Retrieval: Structured Patient Artifacts Reduce Hallucination in Clinical AI Systems
2026-02-16 health informatics 10.64898/2026.02.13.26346256
Top 0.7% (7.9%)
Show abstract

BackgroundLarge language models show promise for clinical decision support, yet their propensity for hallucination--generating plausible but unsupported claims--poses sub-stantial patient safety risks. Retrieval-augmented generation (RAG) is widely assumed to mitigate this problem by grounding outputs in retrieved documents, but this assumption remains inadequately tested in clinical contexts where information density, temporal complexity, and safety stakes are uniquely high. MethodsWe develope...

13
PrivateBoost: Privacy-Preserving Federated Gradient Boosting for Cross-Device Medical Data
2026-02-12 health informatics 10.64898/2026.02.10.26345891
Top 0.7% (7.9%)
Show abstract

Cross-device medical federated learning--where individual patients participate directly rather than institutions--poses a unique challenge: each client holds only a few samples, often just one (e.g., a single diagnostic record), leaving insufficient local data for gradient computation. Existing approaches, such as Secure Aggregation, require client-to-client coordination impractical for intermittently available mobile devices, while homomorphic encryption introduces substantial computational ove...

14
GEN-KnowRD: Reframing AI for Rare Disease Recognition
2026-03-03 health informatics 10.64898/2026.03.02.26347469
Top 0.8% (7.8%)
Show abstract

Rare diseases affect over 300 million people worldwide, yet patients often endure years-long diagnostic delays that limit timely intervention and trial opportunities. Computational rare disease recognition (RDR) remains constrained by knowledge resources that are often incomplete, heterogeneous, and dependent on extensive multi-disciplinary expert curation that cannot scale. Large language models (LLMs) applied directly for end-to-end diagnosis or disease discrimination face similar knowledge bo...

15
How Agent Role Structure Alters Operating Characteristics of Large Language Model Clinical Classifiers: A Comparative Study of Specialist and Deliberative Multi-Agent Protocols
2026-02-24 health informatics 10.64898/2026.02.22.26346818
Top 0.8% (7.8%)
Show abstract

Large language models (LLMs) are increasingly deployed in structured clinical decision support, yet the architectural effects of internal role decomposition within multi-agent systems remain poorly isolated. Prior comparisons of single-agent and multi-agent prompting frequently confound workflow structure with changes in model configuration, training, or decoding. We present a controlled architectural study of role-structured inference under fixed model parameters, isolating internal role decomp...

16
An LLM-assisted framework for accelerated and verifiable clinical hypothesis testing from electronic health records
2026-02-12 health informatics 10.64898/2026.02.10.26346008
Top 0.8% (7.8%)
Show abstract

Acquiring insights from electronic health records (EHRs) is slowed by manual analytical workflows that limit scalability and reproducibility. We present LATCH (LLM-Assisted Testing of Clinical Hypotheses), an agentic framework that converts natural language clinical hypotheses into fully auditable analyses on structured EHR data. LATCH integrates LLM-assisted semantic layers with deterministic execution pipelines to automate cohort construction, statistical analysis, and result reporting, while ...

17
On the robustness of medical term representations in locally deployable language models
2026-02-26 health informatics 10.64898/2026.02.24.26346972
Top 0.9% (7.5%)
Show abstract

Structured AbstractO_ST_ABSBackgroundC_ST_ABSHosting large language models (LLMs) on-premises can secure patient data but requires compact architectures to function on standard hardware. The impact of such constraints on the robustness of their representations for medical terminology is important for clinical AI safety but poorly understood. The statistical nature of LLM training inherently limits the representation of terms with low societal prominence or lexical frequency, and high ambiguity. ...

18
Red-Teaming Medical AI: Systematic Adversarial Evaluation of LLM Safety Guardrails in Clinical Contexts
2026-03-05 health informatics 10.64898/2026.02.26.26347212
Top 1.0% (7.0%)
Show abstract

BackgroundLarge language models (LLMs) are increasingly deployed in medical contexts as patient-facing assistants, providing medication information, symptom triage, and health guidance. Understanding their robustness to adversarial inputs is critical for patient safety, as even a single safety failure can lead to adverse outcomes including severe harm or death. ObjectiveTo systematically evaluate the safety guardrails of state-of-the-art LLMs through adversarial red-teaming specifically designe...

19
Show Your Work: Verbatim Evidence Requirements and Automated Assessment for Large Language Models in Biomedical Text Processing
2026-03-04 health informatics 10.64898/2026.03.03.26346690
Top 1% (6.8%)
Show abstract

PurposeLarge language models (LLMs) are used for biomedical text processing, but individual decisions are often hard to audit. We evaluated whether enforcing a mechanically checkable "show your work" quote affects accuracy, stability, and verifiability for trial eligibility-scope classification from abstracts. MethodsWe used 200 oncology randomized controlled trials (2005 - 2023) and provided models with only the title and abstract. Trials were labeled with whether they allowed for the inclusio...

20
Handling onset age inconsistencies in longitudinal healthcare survey data
2026-02-23 health informatics 10.64898/2026.02.20.26346741
Top 1% (6.6%)
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWLongitudinal healthcare surveys frequently contain inconsistencies in self-reported onset ages, where participants report different ages for the same condition between enrollment and follow-up surveys. We propose two methods to handle this challenge. First, we introduce a procedure that aggregates inconsistency patterns to construct participant-level reliability scores, enabling researchers to stratify participants and prioritize analysis on high-reliability cohorts. Seco...